Clustering based on correlation fractal dimension over an evolving data stream
نویسندگان
چکیده
Online clustering, in an evolving high dimensional data is an amazing challenge for data mining applications. Although, many clustering strategies have been proposed, it is still an exciting task since the published algorithms fail to do well with high dimensional datasets, finding arbitrary shaped clusters and handling outliers. Knowing fractal characteristics of dataset can help abstract the dataset and provide insightful hints in the clustering process. This paper concentrates on presenting a novel strategy, FractStream for clustering data streams using fractal dimension, basic window technology, and damped window model. Core fractal-clusters, progressive fractal-cluster, outlier fractal clusters are identified, aiming to reduce search complexity and execution time. Pruning strategies are also employed based on the weights associated with each cluster, which reduced the usage of main memory. Experimental study of this paper over a number of data sets demonstrates the effectiveness and efficiency of the proposed technique.
منابع مشابه
Clustering Multivariate Data Streams by Correlating Attributes using Fractal Dimension
A data stream is a flow of data produced continuously along the time. Storing and analyzing such information become challenging due to exponential growth of the data volume collected. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching...
متن کاملParticle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension
The work on the paper is focused on the use of Fractal Dimension in clustering for evolving data streams. Recently Anuradha et al. proposed a new approach based on Relative Change in Fractal Dimension (RCFD) and damped window model for clustering evolving data streams. Through observations on the aforementioned referred paper, this paper reveals that the formation of quality cluster is heavily ...
متن کاملClustering Multivariate Climate Data Streamsusing Fractal Dimension
A data stream is a flow of data produced continuously along the time. Storing and analyzing such information become challenging due to exponential growth of the data volume collected. In this context, some methods were proposed to cluster data streams with similar behavior along the time. However, those methods have failed on clustering data flows with more than one attribute, i.e., multivariat...
متن کاملFast estimation of fractal dimension and correlation integral on stream data
Given a cloud of N points in an E-dimensional space, we often need to estimate the intrinsic dimensionality D of this cloud. For example, a set of points in 3-dimensional space all following along a straight line has intrinsic (or fractal) dimensionality D=1. Non-integer fractal dimensionality appears pervasively in nature. In this paper we give a very fast method to estimate the fractal dimens...
متن کاملImproving Multivariate Data Streams Clustering
Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivari...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 15 شماره
صفحات -
تاریخ انتشار 2018